PSkel: A stencil programming framework for CPU-GPU systems
نویسندگان
چکیده
The use of Graphics Processing Units (GPUs) for high-performance computing has gained growing momentum in recent years. Unfortunately, GPU-programming platforms like CUDA are complex, user unfriendly, and increase the complexity of developing high-performance parallel applications. In addition, runtime systems that execute those applications often fail to fully utilize the parallelism of modern CPU-GPU systems. Typically, parallel kernels run entirely on the most powerful device available, leaving other devices idle. These observations sparked research in two directions: (1) high-level approaches to software development for GPUs, which strike a balance between performance and ease of programming; and (2) task partitioning to fully utilize the available devices. In this paper, we propose a framework, called PSkel, that provides a single high-level abstraction for stencil programming on heterogeneous CPU-GPU systems, while allowing the programmer to partition and assign data and computation to both CPU and GPU. Our current implementation uses parallel skeletons to transparently leverage Intel TBB and NVIDIA CUDA. In our experiments, we observed parallel applications with task partitioning can improve average performance by up to 76% and 28% compared to CPU-only and GPU-only parallel applications, respectively.
منابع مشابه
Fast Cellular Automata Implementation on Graphic Processor Unit (GPU) for Salt and Pepper Noise Removal
Noise removal operation is commonly applied as pre-processing step before subsequent image processing tasks due to the occurrence of noise during acquisition or transmission process. A common problem in imaging systems by using CMOS or CCD sensors is appearance of the salt and pepper noise. This paper presents Cellular Automata (CA) framework for noise removal of distorted image by the salt an...
متن کاملUltra-Fast Image Reconstruction of Tomosynthesis Mammography Using GPU
Digital Breast Tomosynthesis (DBT) is a technology that creates three dimensional (3D) images of breast tissue. Tomosynthesis mammography detects lesions that are not detectable with other imaging systems. If image reconstruction time is in the order of seconds, we can use Tomosynthesis systems to perform Tomosynthesis-guided Interventional procedures. This research has been designed to study u...
متن کاملGPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on GPUs
Spatial blocking is a critical memory-access optimization to efficiently exploit the computing resources of parallel processors, such as many-core GPUs. By reusing cache-loaded data over multiple spatial iterations, spatial blocking can significantly lessen the pressure of accessing slow global memory. Stencil computations, for example, can exploit such data reuse via spatial blocking through t...
متن کاملCompiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. As such, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and ma...
متن کاملCaKernel - A GPGPU Kernel Asbtraction and Implementation for Scientific Computing on Heterogeneous Systems
We presented our work to design and implement a GPGPU kernel abstraction, which is suitable for developing highly efficient large scale scientific applications using stencil computations on hybrid CPU/GPU systems. By leveraging the MPI-based data parallelism implemented in Cactus, we have developed a CaKernel programming framework in the CUDA/OpenCL architecture to facilitate the development pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Concurrency and Computation: Practice and Experience
دوره 27 شماره
صفحات -
تاریخ انتشار 2015